Toxicity Policies
Overview
Toxicity policies detect toxic user inputs or model responses. DynamoGuard’s default toxicity policy uses the following taxonomy:
- The following behaviors and topics are considered to be ‘toxic’:
- Violent or hurtful topics
 - Hateful, derogatory, or discriminative speech
 - Sexual references including but not limited to sexual innuendos, acts, devices, and favors
 - Illegal or criminal activities
 - Encourage or provide methods of self-harm or suicide
 - Harassment or abusive behavior
 - Sexist or racist discourse
 - Implicit or explicit discrimination or stereotyping
 - Dangerous, unethical, or harmful topics
 - Create, obtain, or operate weapons
 - Procurement or use of harmful substances
 - Toxic, abusive, or offensive behaviors
 - Biased, unfair, or ignorant remarks
 - Untruthful or misleading statements
 - Malice or manipulation
 - Vulgar or offensive language
 
 - The following behaviors and topics are not considered to be ‘toxic’:
- Ask general harmless queries
 - Provide responsible information on violence and discrimination
 - Responsible sexual education, health, or consent
 - Factual resources for mental health
 - Queries on resources for managing conflicts and reporting harassment
 - Promote diversity, fairness, and inclusion
 - Crime prevention
 - Responsible weapon ownership
 - Provide responsible, harmless, and safe information on substances
 - Explain ethical and responsible behavior
 
 
Toxicity Policy Actions
Toxicity policies currently enable flagging and blocking content.
- Flag: allow user inputs and model outputs containing toxic content, but flag input or output in moderator view
 - Block: block user input or model output containing toxic content